Investigating opportunities for instruction-level parallelism for stack machine code

نویسنده

  • Huibin Shi
چکیده

Today, many general-purpose register-file (GPRF) architectures implement instructionlevel-parallelism (ILP) techniques to improve performance. Less has been done in this area for the so-called ‘stack architecture’. Nonetheless, stack architectures have many advantages over GPRF architectures. Applying ILP techniques in the stack processor domain might ultimately achieve similar, or better, performance to that of register machines. This thesis investigates evidence for naturally occurring ILP in stack-code, and potential performance gains for ILP exploitation. Static analysis is a major area of methodology applied here. In particular this thesis: • Investigates, defines and measures parallelism in standard stack-code. • Investigates effects of certain existing stack-code optimisations. • Applies ILP techniques common to GPRF machines to stack-code, and investigates their effects, in particular loop unrolling, superblock formation, and branch prediction. • Utilises a synthetic approach to superblock formation to estimate ILP conditions under a branch-prediction mechanism. An ‘ideal stack machine’ with unlimited resources was assumed as a base model, then varied non-ideal models were then assumed, and ILP performance gains measured. For each model, varied instruction latencies were also assumed. The effects of architectural variations are thus examined. This thesis demonstrates that ILP exists in the stack code and can potentially improve the performance of appropriate stack machines. Loop unrolling is shown to be effective in enhancing ILP for stack code, and local variable optimisation is also shown to be of importance in partnership with this approach. Creation of superblocks (called “dynamic superblocks” or “artificial superblocks”) constructed through assumed 1-level branch prediction is also shown to be effective. Overall the thesis demonstrates that there is ample evidence of exploitable ILP in stack compiler generated code, and that stack machines could be developed which rank considerably higher in performance terms than the existing stack processor paradigm as it stands.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimization for the Intel

The Intel R © Itanium R © architecture contains a number of innovative compiler-controllable features designed to exploit instruction level parallelism. New code generation and optimization techniques are critical to the application of these features to improve processor performance. For instance, the Itanium R © architecture provides a compilercontrollable virtual register stack to reduce the ...

متن کامل

A predecoding technique for ILP exploitation in Java processors

Java processors have been introduced to offer hardware acceleration for Java applications. They execute Java bytecodes directly in hardware. However, the stack nature of the Java virtual machine instruction set imposes a limitation on the achievable execution performance. In order to exploit instruction level parallelism and allow out of order execution, we must remove the stack completely. Thi...

متن کامل

JAViR – Exploiting Instruction Level Parallelism for JAVA Machine by Using Virtual Registers

Java Virtual Machine architecture is a stack based architecture. Because most Java instructions can operate only on the top of the stack, it is difficult to exploit instruction level parallelism (ILP). In this paper, we introduce a new kind of storage, named virtual register (VR), working together with the stack, to provide a simultaneous access mechanism for a wide-issue high-performance JAViR...

متن کامل

Parallelism of Java Bytecode Programs and a Java ILP Processor Architecture

The Java programming language has been widely used to develop dynamic content in Web pages. The Java Virtual Machine (JVM) executes Java bytecode. For efficient transmission over the Internet, the Java bytecode is a stack oriented architecture: instructions need not contain source and destination specifiers in their bytecodes. The Java bytecodes may be executed on various platforms by interpret...

متن کامل

Two Computer Systems Paradoxes: Serialize-to-parallelize, and Queuing Concurrent-writes

We present and examine the following Serialize-to-Parallelize Paradox: suppose a programmer has a parallel algorithm in mind; the programmer must serialize the algorithm , and is actually trained to suppress its parallelism, while writing code; later, however, compilation and runtime techniques are used to reverse the results of this seri-alization eeort and extract as much parallelism as possi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006